Towards Geo-Distributed Machine Learning

نویسندگان

  • Ignacio Cano
  • Markus Weimer
  • Dhruv Kumar Mahajan
  • Carlo Curino
  • Giovanni Matteo Fumarola
  • Arvind Krishnamurthy
چکیده

Latency to end-users and regulatory requirements push large companies to build data centers all around the world. The resulting data is “born” geographically distributed. On the other hand, many machine learning applications require a global view of such data in order to achieve the best results. These types of applications form a new class of learning problems, which we call Geo-Distributed Machine Learning (GDML). Such applications need to cope with: 1) scarce and expensive cross-data center bandwidth, and 2) growing privacy concerns that are pushing for stricter data sovereignty regulations. Current solutions to learning from geo-distributed data sources revolve around the idea of first centralizing the data in one data center, and then training locally. As machine learning algorithms are communication-intensive, the cost of centralizing the data is thought to be offset by the lower cost of intra-data center communication during training. In this work, we show that the current centralized practice can be far from optimal, and propose a system for doing geo-distributed training. Furthermore, we argue that the geo-distributed approach is structurally more amenable to dealing with regulatory constraints, as raw data never leaves the source data center. Our empirical evaluation on three real datasets confirms the general validity of our approach, and shows that GDML is not only possible but also advisable in many scenarios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KNN Regression as Geo-Imputation Method for Spatio-Temporal Wind Data

The shift from traditional energy systems to distributed systems of energy suppliers and consumers and the power volatileness in renewable energy imply the need for e↵ective short-term prediction models. These machine learning models are based on measured sensor information. In practice, sensors might fail for several reasons. The prediction models cannot naturally cannot work properly with inc...

متن کامل

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds

Machine learning (ML) is widely used to derive useful information from large-scale data (such as user activities, pictures, and videos) generated at increasingly rapid rates, all over the world. Unfortunately, it is infeasible to move all this globally-generated data to a centralized data center before running an ML algorithm over it—moving large amounts of raw data over wide-area networks (WAN...

متن کامل

Global Warming: New Frontier of Research Deep Learning- Age of Distributed Green Smart Microgrid

The exponential increase in carbon-dioxide resulting Global Warming would make the planet earth to become inhabitable in many parts of the world with ensuing mass starvation. The rise of digital technology all over the world fundamentally have changed the lives of humans. The emerging technology of the Internet of Things, IoT, machine learning, data mining, biotechnology, biometric, and deep le...

متن کامل

On-line Spectral Learning in Exploring 3D Large Scale Geo-Referred Scenes

Personalized navigation of 3D large scale geo-referred scenes has a tremendous impact in digital cultural heritage. This is a result of the recent progress in digitization technology which leads to the creation of massive digital geographic libraries. However, an efficient personalized 3D geo-referred architecture requires intelligent and on-line learning strategies able to dynamically capture ...

متن کامل

Geo-Coded Environment for Integrated Smart Systems

Smart Systems provide novel enabling functionalities and as such are currently a driving force behind product innovation. Smart Systems are, therefore, crucial for the competitiveness of companies and entire industry sectors. Geo-tagging and smart spaces are two promising directions in modern mobile market. Geo-tagging allows to markup any kind of data by geographical coordinates and time. This...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2017